Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anirud/multi modal model support #169

Open
wants to merge 31 commits into
base: dev
Choose a base branch
from

Conversation

anirudTT
Copy link
Contributor

@anirudTT anirudTT commented Feb 5, 2025

Changelog

  • Introduces support for 11B Vision Llama model vision with changes in model request prompt to support URLs or images as base64.
  • Improves Rag Table by showing PDF name metadata.
  • Resolved input area resizing issues to ensure consistent UX across different screen sizes.
  • Added functionality to show which rag context is selected in the chat, enhancing clarity and interactions.
  • Images shared in the chat are now visible and can be resized if needed.
  • Default prompt is passed when the text is empty along with an image, but if text is provided, it’s passed through as-is.
  • Added ability to pass text-based files (markdown, shell code, etc. except PDFs) which are extracted and passed to the LLM as a rag context, with support for passing multiple files.

  • RAG Pill displayed in chat thread
Screenshot 2025-02-25 at 3 57 36 PM
  • Default prompt not used when user passes a prompt
Screenshot 2025-02-25 at 3 57 48 PM
  • Multiple code / text based files can be passed tru to the llm
Screenshot 2025-02-25 at 3 57 15 PM
  • Attaching/ Uploading the code file and asking the llm to explain the code
Screenshot 2025-02-25 at 4 14 57 PM

Known Issues / In progress

  • In the same chat thread, if a second question is asked about the image pasted in a previous chat, it isn’t correctly passed through to the prompt templating.
  • Adding a warning to users to use rag when uploading PDFs or connecting it to the same flow of the rag table via the input area. in chatui

@anirudTT anirudTT mentioned this pull request Feb 6, 2025
@anirudTT anirudTT marked this pull request as ready for review February 7, 2025 16:16
@anirudTT

This comment was marked as outdated.

bgoelTT

This comment was marked as outdated.

- also add background blur when image is open in max view
- adds z index for the image dialog box
* copy run agent container + helper func

* copied in updated docker views

* copied in model utils to stream agent response

* copy in model views

* added agent view

* copy in all frontend components

* add search api key to docker compose yml

* copy in updated model urls

* added requirements for dockerfile

* rename hf_model_id

* remove commetned code intepretor tool

* added fix so agent works with other llama models

* fix requirements in dockerfile

* add thread id to match stateful chat

* add readme

* add agent workflow diagram

* Update README.md

* Delete app/api/agent_control/Agent.png

* Add files via upload

* Delete app/api/agent_control/Agent.png

* Add files via upload

* Update README.md

* Delete app/api/agent_control/Agent.png

* Add files via upload

* Delete app/api/agent_control/Agent.png

* Add files via upload

* Update README.md

* fix link href (#180)

* refactor(chat history component): improve file handling and add RAG support

- Add RAG datasource integration with metadata display
- Create reusable FileDisplay component for file management
- Implement FileViewerDialog for improved file preview experience
- Support both image and non-image file types with download option
- Clean up file handling logic and separate from image-specific code
- Add visual indicator for RAG-enabled messages

* Show RAG pill based on the message's stored RAG context

* feat(add support in chat component):
- Use the RAG datasource from the message if available

* move image display to its own component

* include rag source name when selected

* refactor(types): clean up and organize type definitions
- Remove redundant and commented-out interfaces
- Group related interfaces together (chat, inference, file, voice)
- Add proper JSDoc comments for better documentation
- Consolidate duplicate type definitions
- Add explicit typing for RAG-related interfaces

* add pdfjs-dist to test

* feat: improve file display
- show images in better aspect ratio

* display file display for images ,code files and other file types

* add icons for file display in chat thread

* extend types

* extend to add
- File extensions mapping for code files and other file types

* extend to allow for files to be passed as text

* fix alignment

* limit upload to a single image file

* set focused state in input area

* feat: add ability to process multiple code and or text file types and send to model

* re add resizing input are

* fix copy button logic

* Anirud/update vllm setup steps (#189)

* update readme to reflect new flow

* fix readme issues

* add Supported models tab:
pointing to tt-inference-server readme

* docs: Update main readme
- add better quick start guide 
- add better notes for running in development mode

* docs: re add Mock model steps

* docs: fix links

* docs: fix vllm

* Update HowToRun_vLLM_Models.md


* Update HowToRun_vLLM_Models.md
@anirudTT anirudTT force-pushed the anirud/multi-modal-model-support branch from 4c39c27 to f8314c9 Compare February 25, 2025 19:26
@anirudTT anirudTT linked an issue Feb 27, 2025 that may be closed by this pull request
@anirudTT anirudTT requested a review from bgoelTT February 27, 2025 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants